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, Abstract 

^\\ , We review the use of Monte Carlo (MC) simulation to model back- 

grounds to top signal at the Tevatron experiments, CDF and DO, as well 
as the relevant measurements done by the experiments. We'll concentrate 
on the modeling of W and Z boson production in association with jets, 
in particular heavy flavor jets (HF), and also comment on the Tevatron 
Mh| experience using matched MC. 
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1 Introduction 



^ ' The Fermilab Tevatron Collider has provided over 4fb~ of pp collisions at 

C*^ . ^/s = 1.96 GeV, allowing the CDF and DO experiments to make precise mea- 

surements using ti production, and to find evidence for the rare single top 
production process. Both endeavors require a solid understanding of the back- 
ground processes, and MC simulation is a crucial ingredient of the background 
models used. 



2 The background processes 



K> , Precision measurements of top quark properties are performed by studying tt 

}_( ' production in cither: 



• the dilepton decay channel, where the t -^ hW decays are followed with 
W -^ lv{X) and / is an electron or muon. Or in 

• the semileptonic ("lepton plus jets") decay channel, where one of the W 
bosons decays asW—^ lv{X) and the other decays hadronically. 

Fig 1 shows typical sample compositions in these channels. 

The dominate background in the dilepton channels is Drell-Yan plus jets 
production, Z -^ e^e~ in the e^e~ channel, Z -^ IJ'~^ IJ-~ in the ^i^ ^iT channel, 
and Z — > T^T^ with subsequent leptonic t decays in the e/x channel. In these 
proceedings we'll follow the common practice of referring to this background as 
"Z-|-jets" . This background dominates the early stages of the event selection. 
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Figure 1: Examples of sample composition in top pair analyses for the dilepton 
channel [1] (left) and the lepton plus jets channel [2] (right). The "DY" label 
refers Drell-Yan production, as does the "Z+jets" label. The right plot is taken 
from a search for resonant ti production, and shows (in white) a conceivable 
new physics component. 



when the experimental understanding of the samples is verified, e.g., by exam- 
ining many distributions for control samples with fewer jets than the signal. 
But after the final selection, its contribution is small. Dilepton samples are 
quite pure, hence most analyses do not rely on b tagging and the precise flavor 
composition of the Z+jets background is not important. 

The second largest background is multijet production, often referred to as 
"QCD" background. Multijet events are selected when jets are misreconstructed 
as leptons. It is quite difflcult to simulate these mistaken reconstructions both 
at the MC generator level and at the detector simulation level. Therefore data 
driven models are used for these backgrounds (see also sec 7). The next back- 
ground component is from diboson plus jets production. These background are 
quite small, so even a rough simulation suffices for top physics, and they are 
estimated purely from MC. 

In the lepton plus jet channel, the dominate background is VF-|-jets produc- 
tion, which is important both in the control samples and in the signal samples. 
This channel provides the most precise measurements, and most measurements 
use b tagging to suppress background [3]. As a result, the flavor composition of 
the jets produced in association with a W boson is relevant in this channel. 

The search for single top production is characterized by high level of back- 
ground, as the experimental signature of this process contains fewer jets. Thus 
the single top analyses use b tagging to suppress background. A typical sample 
composition is shown in fig 2. These samples are dominated by W-|-jets produc- 
tion, and knowledge of the flavor composition of the jets produced in association 
with a W boson is required to identify the small single-top signal. 
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Figure 2: Sample composition in CDF's single top search. The "mistag" label 
refers to W plus light-flavored jets production. 



3 Matched MC in F+jets 

The calculation of the differential cross sections for VK+jets and Z+jets processes 
(y+jets), and in particular W plus heavy flavor {W + HF) production, is far 
from trivial. It was the motivation for the development of the ALPGEN event 
generator [4], and recent calculations show that sizable NLO corrections exist 
for some final states [5]. 

Production of hard additional partons is well simulated by matrix element 
(ME) generators that calculate 2 ^ n processes at tree level, such as ALPGEN. 
But parton shower (PS) MC, such as pythia [6], are better at simulating softer 
radiation, as the PS approximates the sum of soft contributions from all orders 
in perturbation theory. Hence these tools are used together, the hard 2 ^ n 
interactions being modeled by the ME generator, and the showering by the PS 
generator. Care must be taken to avoid double counting final states, for example, 
those where the 3rd hardest parton can be generated either by the ME or by 
the PS. This is done using a matching prescription, discussed elsewhere [7]. 

The CDF and DO collaborations both generate F+jets MC with ALPGEN 
using the MLM matching prescription [8], with some small differences in the 
matching technology. Since W + HF production is important for top physics, 
both collaborations produce such samples separately. But these samples overlap 
with the M^+jets samples, which include heavy fiavor jets in the PSs, and this 
overlap must be removed. The CDF collaboration does so by classifying 66 and 
cc pairs into those that are in the same parton jet and those that are not. The 
former are taken only from the PS MC (herwig [9]), and the latter only from the 
ME MC (alpgen). This has the advantage of playing to each MC's strength. 
The DO collaboration uses the more straight-forward solution of discarding any 
events that were generated as VF-t-jets by the ME MC (alpgen) and contain 
heavy-flavor jets added by the PS MC (pythia). 

Other differences are in the px cut used for the matching within each sample 
(15GeV in CDF, 8GeV in DO), which has little effect, in the light-parton jet 
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Figure 3: Vertex mass fit for tagged 
jets in selected sample of ref [11]. 



Figure 4: Measured ratio 

[cr (VF + c-jet) /ly + jets] from 
ref [13]. 



multiplicities produced for each sample (up to 4 in CDF, up to 5 in DO), and 
in the treatment of W+c production (separate in CDF, included in W + LF in 
DO). 



4 Measurements of V+jets processes 

Given the difficulties in calculating and simulation V^+jets processes, it is in- 
structive to compare them to data. In this section we review measurements 
of y+jets production from CDF and DO. The leptonic W and Z decay chan- 
nels provide clear experimental signatures and are used throughout. Since the 
additional jets are produced by the strong interaction, which favors soft and 
coUinear radiation, selection cuts on energies and angles have a large effect on 
the cross sections. Relevant selection cuts will be stressed in this section. 

Both collaborations have preliminary results from measurements of W^ -I- 
6— jet production. The DO collaboration set a limit of a (pp — ^ Who) < 4.6 pb 
at 95% C.L. using 382 pb^^ of data [10]. The jets' pr was required to be above 
20 GeV and their direction to satisfy lif^^l < 2.0, and only events with one or 
two jets were used. On the first day of the conference, the CDF collaboration re- 
leased preliminary results from a measurement of the 5-jet production cross sec- 
tion in association with a W boson: ab-jcts {pp ^ W + b — jets) • B {W — > liy) = 
2.74 ± 0.27 (stat.) ± 0.42 (syst.) pb. The dataset used in this measurement had 
an integrated luminosity of 1.9 pb""'^ [11] (see fig 3). Jets were reconstructed 
with i?conc = 0.4, and counted as b jets if ARbj < 0.4, _Ei? > 20 GeV, and 



\if \ < 2. The measured cross section is significantly higher than the alpgen 
prediction of 0.78 pb. 

Both collaborations studied the rate of W -I- c— jet production. The CDF col- 



laboration measured a [pp -^W + c—]ct)-B {W -^ Iv) ~ 9.8±2.8 (stat.)^jQ (syst.)± 
0.6 (lumi.) pb using 1.8 fb~^ of data [12]. The c-jet pr was required to be above 
8 GeV and their direction to satisfy \rj\ < 3.0. A recent preHminary resuh from 
the DO collaboration was shown at the conference, they measure the ratio 

^^ a{pp^W + c-jet) 

a {pp -f T4^+jets) ' ^ ' 

and find R = I 7.4 ± 1.9 (stat.)-^'^ (syst.) 1 % using 1 fb^^ of data [13] (see also 

fig 4). The jets' pt was required to be above 20 GeV and their direction to 
satisfy |??^°*| < 2.5. The measured fraction is higher than the ALPGEN prediction 
of (4.4 ±0.3 (PDF)) %. 

Finally, the CDF collaboration measured the differential Ty+jets production 
cross section as a function of the number of jets and the jet transverse energy 
using 320 pb^ of data [14]. Jets are required to have \rj\ < 2.0. The mea- 
sured cross sections are compared to ncxt-to-leading order predictions and to 
predictions from two matched MC generators. 

5 Modeling Z+jets production as a background 

Z+jets production appears at a lower rate than T4^+jets production, but has 
much less background, making it a good process for tuning the simulations. 
Usually it suffices to normalize simulated cross sections according to cross sec- 
tions calculated at next-to-leading order (NLO) by the MCFM program [15], 
though next-to-next-to-leading order calculations are also used sometimes. As 
noted above, the strong dependence of the cross sections on the kinematic cuts 
must be taken into account. Some analyses normalize the total rate to data, 
for example, ref [16] where the apparent data vs. MC discrepancy for W plus 
a few jets production can be resolved either by jet energy calibration effects or 
by the appropriate choice of the hadronization and factorization scales. 

The kinematics of Z-f-jets production can also be tuned to data. Recently 
the DO collaboration noted that resbos [17] calculations match their observed 
da/dp^ distributions well [18] (see fig 5), and are starting to use resbos as a 
surrogate to the data, reweighting ALPGEN-|-pythia MC so it agrees with the px 
spectrum predicted by RESBOS. This reweighting is also carried over to H^-|-jets 
production. During the conference, ALPGEN authors commented that this may 
be due to the tuning of ALPGEN parameters used at DO, as ALPGEN with the 
default parameters agrees with RESBOS [19]. 

The DO collaboration also compared differential Z-l-jets cross sections be- 
tween data and the predictions of the SHERPA [20] and pythia event genera- 
tors [21]. As expected, since pythia is a parton shower generator it does not 
generate sufficient additional radiation, while SHERPA simulates these aspects 
adequately. Some inaccuracies are also evident in the pythia simulation of the 
unsigned rapidity difference between the two leading jets. It is interesting to 
note that again, SHERPA simulates the distribution adequately. 
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Figure 5: Normalized differential cross section as a function of transverse mo- 
mentum for the inclusive sample in ref [18]. 



The differential Z+jets cross sections were also measured by the CDF collab- 
oration, which compared the data both to NLO calculations (performed using 
MCFM) and to different matched MCs [22]. They found excellent agreement 
between the data and the NLO calculations of the cross sections as a function 
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6 Modeling W+jets production as background 

Top quark measurements by the CDF and DO collaborations model Ty-|-jets 
production on the basis of the differential distributions predicted by matched 
ALPGEN MC. There are some indications that small corrections to the differential 
distributions may be required, and these are treated as systematic uncertainties 
in some analysis (e.g. ref [23]). On the other hand, there is a clear need 
for correcting the predicted integrated M^-l-jets and W + HF cross sections, 
and these are normalized to data, after other backgrounds (multijets, dibosons, 
etc.) are subtracted. Typically, M^-|-jets production is normalized to data before 
b tagging, and the fraction of VF -f HF in the total M^-|-jets production is then 
fitted to data after b tagging. 

In DO analyses the M^-|-jets normalization differs from analysis to analysis. 
It is determined either by counting events with one or two jets, or by fitting a 
discriminant in tt signal samples (with > 3 jets). The fraction of heavy flavor 
in the W^-l-jets was normalized to data using the number of events with no b 
tagged jets [24]. This yielded a correction of Khf — 1.5 ± 0.45 (see fig 6) to 
be applied to the heavy flavor fraction simulated by ALPGEN. Later analyses 
used tighter selection cuts and normalized the fraction of events with no b tags 
(rather then their absolute number). Tests for systematic effects revealed that 
this factor is sensitive to the other background in these samples, and to the jet 
selection. The resulting normalization was -K^hf = 1.17±0.18. Oddly, switching 
from ALPGEN version 2.05 to version 2.12 changed the Ty + HF cross section by 
a factor of « 2, which together with more minor improvements to the analyses 
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Figure 6: Measurements of Xhf from ref [24]. The points are the measured 
correction factor in each dataset. The sohd hue is the average of these values. 
The dot-dash inner band shows the uncertainty from the the fit to the eight 
data points. The dashed outer fine shows the uncertainty used in the analysis. 



yielded a new value of i^HF — 1.9 ± 0.3 

In CDF analyses, distributions of jet-flavor discriminating variable such as 
the output of neural network b tagger or the mass of reconstructed secondary 
vertex in W^ -I- Ijet data are fit to the sum of light, charm, and bottom jet 
templates. This yields K^p = 1.4 ± 0.4. The P^ -I- LF component of the b- 
tagged samples is then determined by applying 6-tagging rates either to the 
data before 6 tagging or to VF + LF MC samples. To date, Kf^i — Kcb = 1 and 
Kc — 1 are used in all Tevatron top quark measurements as they are consistent 
with the data. 

In searches for physics beyond the SM in top samples, the data often al- 
lows for significant non-standard production. When the F-l-jets background is 
normalized to data in the signal samples (e.g. > 3 jets for tt), the possible non- 
standard production can affect the measured normalization. For example, in the 
searches for resonant top-pair production this is explicitly accounted for [25] . 



7 MC use in the modeling of multijet background 

Background from multijet production with a fake lepton is modeled using various 
data-driven techniques with little use of MC inputs. Still, there is a place 
for MC in the modeling of multijet background. The data samples on which 
these estimations are based are typically dominated by three jet events that are 
reconstructed as a lepton and two jets. As three jet production can be easily 
generated by MC techniques, such samples can used to verify that the data 
driven techniques work as intended. 



8 DO experience in using matched MC 

Most of Tcvatron experience with using matched MC is with ALPGEN, as at 
the time it was the only matched MC that could be run, integrated with the 
experiments' software, and be mass produced. Both experiments produced a 
wide range of physics results using ALPGEN. But this success did not come 
without some difficulties, and a few lessons may be learned from DO's experience. 

The DO collaboration overlays data collected with no trigger bias over the 
simulated hard scatters, to simulate additional interactions, pileup effects, noise, 
etc. As the Tevatron luminosity increases, it is desirable to overlay both older 
data and the very latest data. But data quality issues can arise at the late 
stages of data analysis, and data that was overlaid over the MC may later be 
classified as bad. Thus DO removes events from the MC samples if their overlaid 
data was of bad quality. The HF removal described in sec 3 is also performed 
in this post-processing step. 

This contributes to the problem of long turnaround times. Once a new fea- 
ture is put into the MC, it waits for the MC authors to make a software release, 
then the experiment needs to build and verify its software using the new MC ver- 
sion, the samples need to be produces (lots of events needed with one or no extra 
jets, generating events with many extra jets is slow), the post-production de- 
scribed above is done, and finally the new samples must be propagated through 
the physics analyses. Overall, six to twelve months pass before a change in the 
MC is evaluated. The long turnaround times have made even small mistakes, 
such as in setting random seeds, very costly. This limits our ability to generate 
sufficient samples to study systematics. 

When using matched MC, the different parton-jet bins must be matched 
with the correct weights. These weights have a wide range, which complicates 
the statistical analysis of the simulated background. This wide range is un- 
avoidable when simulating extra jet production, as more detailed simulation of 
the rare processes with many extra jets is needed. The weights are also sample 
dependent, and so depend also on the post-processing described above. There- 
fore the simulated samples must be frozen, resulting in difficult book keeping 
which is further complicated by the need for generating Z+jets MC in different 
mass bins. A possible lesson is that MC production should be designed to avoid 
any post processing that changes the matching weights. E.g. in order to avoid 
changes due to data quality, it may be possible to overlay the same set of data 
events on top of all MC samples to be matched together. 

9 Conclusions 

Modeling T4^-|-jets and Z-fjets backgrounds purely from the simulation is insuf- 
ficient, and additional inputs from data are required. Though a generic solution 
can work for most analyses, some analyses can make due without the most so- 
phisticated treatments, and some (especially new physics searches) have their 
own unique requirements. Several approaches are used to estimate the heavy 



flavor contributions, and the overall W+jets contributions. The data indicates 
that VF+jets and in particular W + HF production is more copious than pre- 
dicted by ALPGEN. 

Matched ALPGEN MC has been used extensively for the last couple of years 
and was able to meet all our physics needs. Some possible inaccuracies have 
been identified, in particular in jet angular variables, and some technical lessons 
can be learned. Other generators seem promising, but have received much less 
scrutiny at the Tevatron. 

References 

[I] CDF Note 9271, 

http : //www-cdf . f nal . gov/physics/new/top/2008/xsection/ttbar_dil_btag/ 

[2] DO Note 5600, 

http : //www-dO . f nal . gov/Run2Physics/WWW/results/prelim/TDP/T65 

[3] Palcncia E., Tools for top physics at CDF, in these proceedings; 
Harel A., Tools for top physics at DO, in these proceedings. 

[4] Mangano M.L., Moretti M. and Pittau R., Nucl.Phys. B, 632 (2002) 343. 

[5] Campbell J., Ellis R. K., Maltoni F. and Willenbrock S., Phys.Rev. D, 15 
(2007) 054015. 

[6] Sjostrand T., Mrenna S. and Stands P., J. High Energy Phys., 05 (2006) 
026. 

[7] E.g. Maltoni F., Cenerators for top physics: ME+PS and NLO, in these 
proceedings. 

[8] Mangano M.L. et at, J. High Energy Phys., 01 (2007) 013. 

[9] Corcella C. et al., J. High Energy Phys., 01 (2001) 010. 

[10] DO Note 4896, 

http : //www-dO . f nal . gov/Run2Physics/WWW/results/preliin/HIGGS/H12 

[II] CDF Note 9321, 

http: //www-cdf . fnal .gov/physics/new/hdg/results/wbjets_080111/ 

]12] CDF Collaboration, Aaltonen T. et al, Phys. Rev. Lett, 100 (2008) 
091803. 

]13] DO Collaboration, Abazov V. M. et al, preprint FERMILAB-PUB-08/062- 
E, accepted by Phys. Lett. B. 

]14] CDF Collaboration, Aaltonen T. et al, Phys. Rev. D, 77 (2008) 011108. 

]15] Campbell J. Ellis R.K., Phys. Rev. D, 65 (2002) 113007. 



[16] CDF Note 9202, 

http : //www-cdf . f nal . gov/physics/new/top/2008/tprop/TopFCNC_vl . 5/ 

[17] Balazs C. and Yuan C.P., Phys.Rev.D, 56 (1997) 5558. 

]18] http : //www-dO . f nal . gov/Ruii2Physics/WWW/results/prelim/HIGGS/H15/ 

[19] Private communications with Mangano M. L. 

[20] Gleisberg T. et at, J. High Energy Phys., O402 (2004) 056. 

[21] DO Note 5066, 

http : //www-dO . f nal . gov/Run2Physics/WWW/results/prelim/HIGGS/H15/ 

[22] CDF Collaboration, Aaltonen T. et at, Phys. Rev. Lett., 100 (2008) 
102001. 

[23] CDF Note 9221, 

http : //www-cdf . f nal . gov/physics/new/top/2008/singletop/LF/ 

[24] DO Collaboration, Abazov V.M. et at, Phys. Rev. D, 18 (2008) 012005. 

[25] E.g., DO Collaboration, Abazov V.M. et at, Phys. Rev. Lett., 100 (2008) 
192004. 



10 



